Overview

Dataset statistics

Number of variables13
Number of observations10692
Missing cells0
Missing cells (%)0.0%
Duplicate rows358
Duplicate rows (%)3.3%
Total size in memory1.1 MiB
Average record size in memory104.0 B

Variable types

NUM9
CAT4

Reproduction

Analysis started2020-05-14 00:19:13.108045
Analysis finished2020-05-14 00:19:29.505963
Duration16.4 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 358 (3.3%) duplicate rows Duplicates
fire insurance (R$) is highly correlated with rent amount (R$)High correlation
rent amount (R$) is highly correlated with fire insurance (R$)High correlation
total (R$) is highly correlated with hoa (R$)High correlation
hoa (R$) is highly correlated with total (R$)High correlation
area is highly skewed (γ1 = 69.59680369) Skewed
hoa (R$) is highly skewed (γ1 = 69.03938119) Skewed
property tax (R$) is highly skewed (γ1 = 96.01359411) Skewed
total (R$) is highly skewed (γ1 = 58.96080292) Skewed
parking spaces has 2683 (25.1%) zeros Zeros
hoa (R$) has 2373 (22.2%) zeros Zeros
property tax (R$) has 1596 (14.9%) zeros Zeros

Variables

city
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size83.5 KiB
São Paulo
5887
Rio de Janeiro
1501
Belo Horizonte
1258
Porto Alegre
1193
Campinas
 
853
ValueCountFrequency (%) 
São Paulo588755.1%
 
Rio de Janeiro150114.0%
 
Belo Horizonte125811.8%
 
Porto Alegre119311.2%
 
Campinas8538.0%
 

Length

Max length14
Median length9
Mean length10.54517396
Min length8

area
Real number (ℝ≥0)

SKEWED

Distinct count517
Unique (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean149.21791994014217
Minimum11
Maximum46335
Zeros0
Zeros (%)0.0%
Memory size83.5 KiB

Quantile statistics

Minimum11
5-th percentile30
Q156
median90
Q3182
95-th percentile400
Maximum46335
Range46324
Interquartile range (IQR)126

Descriptive statistics

Standard deviation537.0169423
Coefficient of variation (CV)3.598877015
Kurtosis5548.308334
Mean149.2179199
Median Absolute Deviation (MAD)45
Skewness69.59680369
Sum1595438
Variance288387.1964
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
503343.1%
 
703293.1%
 
602972.8%
 
1002532.4%
 
802532.4%
 
402212.1%
 
902092.0%
 
2001931.8%
 
451891.8%
 
1201831.7%
 
Other values (507)823177.0%
 
ValueCountFrequency (%) 
111< 0.1%
 
121< 0.1%
 
132< 0.1%
 
15190.2%
 
16160.1%
 
ValueCountFrequency (%) 
463351< 0.1%
 
246061< 0.1%
 
127321< 0.1%
 
20002< 0.1%
 
16002< 0.1%
 

rooms
Real number (ℝ≥0)

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.506079311634867
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Memory size83.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum13
Range12
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.171266254
Coefficient of variation (CV)0.4673699865
Kurtosis1.487658631
Mean2.506079312
Median Absolute Deviation (MAD)1
Skewness0.7023905761
Sum26795
Variance1.371864638
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3326930.6%
 
2297527.8%
 
1245423.0%
 
4158614.8%
 
52882.7%
 
6680.6%
 
7360.3%
 
8110.1%
 
103< 0.1%
 
131< 0.1%
 
ValueCountFrequency (%) 
1245423.0%
 
2297527.8%
 
3326930.6%
 
4158614.8%
 
52882.7%
 
ValueCountFrequency (%) 
131< 0.1%
 
103< 0.1%
 
91< 0.1%
 
8110.1%
 
7360.3%
 

bathroom
Real number (ℝ≥0)

Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.2368125701459034
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size83.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.407198198
Coefficient of variation (CV)0.6291086777
Kurtosis1.134852401
Mean2.23681257
Median Absolute Deviation (MAD)1
Skewness1.213809657
Sum23916
Variance1.980206769
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1430140.2%
 
2291027.2%
 
3147413.8%
 
4111110.4%
 
55785.4%
 
62152.0%
 
7850.8%
 
8110.1%
 
94< 0.1%
 
103< 0.1%
 
ValueCountFrequency (%) 
1430140.2%
 
2291027.2%
 
3147413.8%
 
4111110.4%
 
55785.4%
 
ValueCountFrequency (%) 
103< 0.1%
 
94< 0.1%
 
8110.1%
 
7850.8%
 
62152.0%
 

parking spaces
Real number (ℝ≥0)

ZEROS

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.6091470258136924
Minimum0
Maximum12
Zeros2683
Zeros (%)25.1%
Memory size83.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum12
Range12
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.589520724
Coefficient of variation (CV)0.9878032885
Kurtosis2.769074701
Mean1.609147026
Median Absolute Deviation (MAD)1
Skewness1.487534127
Sum17205
Variance2.526576131
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1363034.0%
 
0268325.1%
 
2207019.4%
 
39689.1%
 
47897.4%
 
52302.2%
 
61631.5%
 
81231.2%
 
7330.3%
 
102< 0.1%
 
ValueCountFrequency (%) 
0268325.1%
 
1363034.0%
 
2207019.4%
 
39689.1%
 
47897.4%
 
ValueCountFrequency (%) 
121< 0.1%
 
102< 0.1%
 
81231.2%
 
7330.3%
 
61631.5%
 

floor
Categorical

Distinct count35
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size83.5 KiB
-
2461
1
1081
2
985
3
931
4
 
748
Other values (30)
4486
ValueCountFrequency (%) 
-246123.0%
 
1108110.1%
 
29859.2%
 
39318.7%
 
47487.0%
 
56005.6%
 
65395.0%
 
74974.6%
 
84904.6%
 
93693.5%
 
Other values (25)199118.6%
 

Length

Max length3
Median length1
Mean length1.18630752
Min length1

animal
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size83.5 KiB
acept
8316
not acept
2376
ValueCountFrequency (%) 
acept831677.8%
 
not acept237622.2%
 

Length

Max length9
Median length5
Mean length5.888888889
Min length5

furniture
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size83.5 KiB
not furnished
8086
furnished
2606
ValueCountFrequency (%) 
not furnished808675.6%
 
furnished260624.4%
 

Length

Max length13
Median length13
Mean length12.02506547
Min length9

hoa (R$)
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1679
Unique (%)15.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1174.0216984661429
Minimum0
Maximum1117000
Zeros2373
Zeros (%)22.2%
Memory size83.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1170
median560
Q31237.5
95-th percentile3167.45
Maximum1117000
Range1117000
Interquartile range (IQR)1067.5

Descriptive statistics

Standard deviation15592.30525
Coefficient of variation (CV)13.28110483
Kurtosis4912.249106
Mean1174.021698
Median Absolute Deviation (MAD)550
Skewness69.03938119
Sum12552640
Variance243119983
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0237322.2%
 
4001771.7%
 
3001681.6%
 
5001641.5%
 
6001411.3%
 
4501401.3%
 
3501371.3%
 
7001311.2%
 
10001251.2%
 
20001141.1%
 
Other values (1669)702265.7%
 
ValueCountFrequency (%) 
0237322.2%
 
1300.3%
 
31< 0.1%
 
101< 0.1%
 
152< 0.1%
 
ValueCountFrequency (%) 
11170002< 0.1%
 
2200001< 0.1%
 
2000001< 0.1%
 
811501< 0.1%
 
320001< 0.1%
 

rent amount (R$)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1195
Unique (%)11.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3896.247194163861
Minimum450
Maximum45000
Zeros0
Zeros (%)0.0%
Memory size83.5 KiB

Quantile statistics

Minimum450
5-th percentile859.1
Q11530
median2661
Q35000
95-th percentile12000
Maximum45000
Range44550
Interquartile range (IQR)3470

Descriptive statistics

Standard deviation3408.545518
Coefficient of variation (CV)0.8748278402
Kurtosis4.62422818
Mean3896.247194
Median Absolute Deviation (MAD)1361
Skewness1.838877304
Sum41658675
Variance11618182.55
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25002582.4%
 
20002442.3%
 
12002372.2%
 
30002352.2%
 
150002312.2%
 
35002162.0%
 
18002152.0%
 
15002112.0%
 
40002021.9%
 
22002011.9%
 
Other values (1185)844279.0%
 
ValueCountFrequency (%) 
4501< 0.1%
 
4601< 0.1%
 
500310.3%
 
5031< 0.1%
 
5051< 0.1%
 
ValueCountFrequency (%) 
450001< 0.1%
 
300001< 0.1%
 
250001< 0.1%
 
240001< 0.1%
 
200005< 0.1%
 

property tax (R$)
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count1243
Unique (%)11.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean366.70435839880287
Minimum0
Maximum313700
Zeros1596
Zeros (%)14.9%
Memory size83.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q138
median125
Q3375
95-th percentile1342.8
Maximum313700
Range313700
Interquartile range (IQR)337

Descriptive statistics

Standard deviation3107.832321
Coefficient of variation (CV)8.475035134
Kurtosis9667.782564
Mean366.7043584
Median Absolute Deviation (MAD)121
Skewness96.01359411
Sum3920803
Variance9658621.736
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0159614.9%
 
1001811.7%
 
501711.6%
 
841451.4%
 
2501311.2%
 
421151.1%
 
1671051.0%
 
251031.0%
 
59980.9%
 
67980.9%
 
Other values (1233)794974.3%
 
ValueCountFrequency (%) 
0159614.9%
 
1260.2%
 
24< 0.1%
 
3120.1%
 
4120.1%
 
ValueCountFrequency (%) 
3137001< 0.1%
 
281201< 0.1%
 
218801< 0.1%
 
125001< 0.1%
 
108301< 0.1%
 

fire insurance (R$)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count216
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.300879161990274
Minimum3
Maximum677
Zeros0
Zeros (%)0.0%
Memory size83.5 KiB

Quantile statistics

Minimum3
5-th percentile12
Q121
median36
Q368
95-th percentile160
Maximum677
Range674
Interquartile range (IQR)47

Descriptive statistics

Standard deviation47.76803093
Coefficient of variation (CV)0.8961959292
Kurtosis5.934963027
Mean53.30087916
Median Absolute Deviation (MAD)18
Skewness1.970399756
Sum569893
Variance2281.784779
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
163002.8%
 
202912.7%
 
262702.5%
 
222562.4%
 
142482.3%
 
172482.3%
 
232452.3%
 
132392.2%
 
182202.1%
 
192142.0%
 
Other values (206)816176.3%
 
ValueCountFrequency (%) 
32< 0.1%
 
42< 0.1%
 
55< 0.1%
 
6100.1%
 
7530.5%
 
ValueCountFrequency (%) 
6771< 0.1%
 
4511< 0.1%
 
3761< 0.1%
 
3381< 0.1%
 
3051< 0.1%
 

total (R$)
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct count5751
Unique (%)53.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5490.4869996258885
Minimum499
Maximum1120000
Zeros0
Zeros (%)0.0%
Memory size83.5 KiB

Quantile statistics

Minimum499
5-th percentile1128.55
Q12061.75
median3581.5
Q36768
95-th percentile15164.5
Maximum1120000
Range1119501
Interquartile range (IQR)4706.25

Descriptive statistics

Standard deviation16484.72591
Coefficient of variation (CV)3.002415981
Kurtosis3926.019305
Mean5490.487
Median Absolute Deviation (MAD)1842.5
Skewness58.96080292
Sum58704287
Variance271746188.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2555390.4%
 
2633250.2%
 
4089210.2%
 
1219150.1%
 
760120.1%
 
1572110.1%
 
1117110.1%
 
2586100.1%
 
1431100.1%
 
10840100.1%
 
Other values (5741)1052898.5%
 
ValueCountFrequency (%) 
4991< 0.1%
 
5072< 0.1%
 
5081< 0.1%
 
5091< 0.1%
 
5451< 0.1%
 
ValueCountFrequency (%) 
11200002< 0.1%
 
3169001< 0.1%
 
2332001< 0.1%
 
2221001< 0.1%
 
956101< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

cityarearoomsbathroomparking spacesflooranimalfurniturehoa (R$)rent amount (R$)property tax (R$)fire insurance (R$)total (R$)
0São Paulo702117aceptfurnished20653300211425618
1São Paulo32044020aceptnot furnished120049601750637973
2Porto Alegre801116aceptnot furnished100028000413841
3Porto Alegre512102aceptnot furnished270111222171421
4São Paulo251101not aceptnot furnished08002511836
5São Paulo376337-aceptnot furnished080008341218955
6Rio de Janeiro722107aceptnot furnished740190085252750
7São Paulo2134444aceptnot furnished225432231735417253
8São Paulo1522213aceptfurnished10001500025019116440
9Rio de Janeiro351102aceptfurnished590230035302955

Last rows

cityarearoomsbathroomparking spacesflooranimalfurniturehoa (R$)rent amount (R$)property tax (R$)fire insurance (R$)total (R$)
10682Porto Alegre1603234aceptfurnished8503300220494419
10683São Paulo2804425aceptnot furnished420040001042519293
10684Rio de Janeiro982101aceptnot furnished5603900184514695
10685São Paulo8332211aceptnot furnished8887521221968726
10686São Paulo1503328not aceptfurnished013500017213670
10687Porto Alegre632115not aceptfurnished402147824221926
10688São Paulo28544417aceptnot furnished31001500097319119260
10689Rio de Janeiro703308not aceptfurnished9806000332787390
10690Rio de Janeiro1202228aceptfurnished15851200027915514020
10691São Paulo80210-aceptnot furnished01400165221587

Duplicate rows

Most frequent

cityarearoomsbathroomparking spacesflooranimalfurniturehoa (R$)rent amount (R$)property tax (R$)fire insurance (R$)total (R$)count
80Porto Alegre471111not aceptfurnished4002200033263322
153São Paulo20110-aceptfurnished602180013023255514
201São Paulo451111not aceptfurnished3000552007085909
160São Paulo201105aceptfurnished60218001302325557
187São Paulo351101not aceptnot furnished250130501715727
195São Paulo40110-not aceptnot furnished078017128097
186São Paulo35110-aceptnot furnished01100301411446
211São Paulo50110-not aceptnot furnished01250341913036
68Campinas110332-aceptnot furnished5603200884938975
87Rio de Janeiro15110-aceptnot furnished07000107105